lemma 8
Parameter-Efficient Generative Modeling with Controlled Vector Fields
We introduce a continuous-time generative modeling framework, motivated by the Chow-Rashevskii theorem, that builds expressive flows from a small set of fixed vector fields and learned scalar controls. Instead of learning an unconstrained high-dimensional vector field, our framework constructs the velocity by modulating fixed vector fields with learned scalar control functions. When the fixed fields are bracket-generating, their Lie algebra spans the ambient space, providing a mechanism for expressive transport with only a small number of learned control channels and offering a parameter-efficient geometric alternative to standard vector-field parameterizations. This decoupled formulation yields a structured and interpretable generative model in which the number of learned scalar output channels can be chosen independently of the ambient dimension. We formulate an expressivity principle showing that, under suitable controllability and well-posedness assumptions, such controlled flows can transport a source distribution to a target distribution. We train the resulting model using a continuous-normalizing-flow likelihood objective and present proof-of-concept experiments on synthetic distributions.
Supplementary Material for: An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap
We first verify the statement for the terminal state f. Observe that at the terminal state f, regardless of the action taken, the next state is always f and the reward is always 0. Hence Q h(f,) = V h(f) = 0 for all h [H]. Thus Q h(f,) = hφ(f,),v(a)i= 0. We now verify realizability for other states via induction on h = H,H 1,,1. Next, note that h, (2) follows from (1). In other words, (1) implies that a is always the optimal action.
Contents of Appendix A Extended Literature Review 14 B Time Uniform Lasso Analysis 15 C Results on Exploration 18 C.1 ALE
Table 2 compares recent work on sparse linear bandits based on a number of important factors. Some of the mentioned bounds depend on problem-dependent parameters (e.g. Carpentier and Munos [ 2012 ] assume that the action set is a Euclidean ball, and that the noise is directly added to the parameter vector, i.e. In this setting, Carpentier and Munos [ 2012 ] present a O ( d p n) regret bound. Li et al. [ 2022 ] require a stronger condition This is generally not true, but may hold with high probability.
paper
In this section we provide a detailed proof for the main theorem. First we state some facts about the learning rate and the algorithm. This bound contains three parts. The first is an upper bound for the first step when there is no data. The third part is an "average" of the estimated future regret.